AITopics | gradient propagation

Collaborating Authors

gradient propagation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

c464fc4516aca4e68f2a14e67c6f0402-Paper-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 23:49:31 GMT

machine learning, plasticity, reinforcement learning, (12 more...)

Neural Information Processing Systems

Country:

Europe > Russia (0.04)
Asia > Russia (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Add feedback

e2065cb56f5533494522c46a72f1dfb0-Paper.pdf

Neural Information Processing SystemsFeb-10-2026, 19:51:55 GMT

arxiv preprint arxiv, gradient propagation, mechanism, (12 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.05)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

e2065cb56f5533494522c46a72f1dfb0-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-10-2026, 19:51:43 GMT

We thank the reviewers for insightful remarks and comments that help to considerably improve our manuscript. We1 address the most important ones in detail below. Before doing so, we highlight a comment from R3 in order to make an2 important clarification about the scope of our contribution. "It is well known that an attention mechanism would reduce3 gradient vanishing. It feels trivial to me as there is a direct connection for gradients to pass. We are in complete agreement and recognize that the very mechanism of (self-)attention was designed to improve6 gradient propagation over long sequences, and that sparsity is a good way to keep complexity costs low. Much like work from the '90s established formal results for gradient exploding/vanishing in deep/recurrent networks, we9 believe it is crucial to establish similar theoretical tools for attention mechanisms, as these methods are under intense10 development where scalability and complexity are important issues. The proposed relevancy mechanism and accompanying experiments,14 building on established work, are meant to illustrate how our theorems can be concretely exploited. We chose simple15 tasks for their ease of interpretation, and their variety of computational demands (memorization, prediction, RL, etc.).16 As is clearly indicated in the text, it is not our goal to propose this method "as is" in a race for state-of-the-art. Werecognize thatreviewersmay have basedtheir evaluation asthey wouldhavein amethod paper, and we20 kindly invite them to reconsider the value of our experiments in the broader context of our theoretical contributions. We21 also thank reviewers for their additional minor comments not explicitly addressed here and agree to implement them.22 R1: Q"The authors didn't spell out the relation between κ and d: higher κ tends to have smaller d.

artificial intelligence, machine learning, mechanism, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.35)

Add feedback

Interpolation Technique to Speed Up Gradients Propagation in Neural ODEs

Neural Information Processing SystemsDec-24-2025, 13:46:21 GMT

We propose a simple interpolation-based method for the efficient approximation of gradients in neural ODE models. We compare it with reverse dynamic method (known in literature as "adjoint method") to train neural ODEs on classification, density estimation and inference approximation tasks. We also propose a theoretical justification of our approach using logarithmic norm formalism. As a result, our method allows faster model training than the reverse dynamic method what was confirmed and validated by extensive numerical experiments for several standard benchmarks.

gradient propagation, interpolation technique, name change, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.62)

Add feedback

c464fc4516aca4e68f2a14e67c6f0402-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 06:48:18 GMT

machine learning, plasticity, reinforcement learning, (12 more...)

Neural Information Processing Systems

Country:

Europe > Russia (0.04)
Asia > Russia (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Add feedback

Low-rank surrogate modeling and stochastic zero-order optimization for training of neural networks with black-box layers

Chertkov, Andrei, Basharin, Artem, Saygin, Mikhail, Frolov, Evgeny, Straupe, Stanislav, Oseledets, Ivan

arXiv.org Artificial IntelligenceSep-19-2025

The growing demand for energy-efficient, high-performance AI systems has led to increased attention on alternative computing platforms (e.g., photonic, neuromorphic) due to their potential to accelerate learning and inference. However, integrating such physical components into deep learning pipelines remains challenging, as physical devices often offer limited expressiveness, and their non-differentiable nature renders on-device backpropagation difficult or infeasible. This motivates the development of hybrid architectures that combine digital neural networks with reconfigurable physical layers, which effectively behave as black boxes. In this work, we present a framework for the end-to-end training of such hybrid networks. This framework integrates stochastic zeroth-order optimization for updating the physical layer's internal parameters with a dynamic low-rank surrogate model that enables gradient propagation through the physical layer. A key component of our approach is the implicit projector-splitting integrator algorithm, which updates the lightweight surrogate model after each forward pass with minimal hardware queries, thereby avoiding costly full matrix reconstruction. We demonstrate our method across diverse deep learning tasks, including: computer vision, audio classification, and language modeling. Notably, across all modalities, the proposed approach achieves near-digital baseline accuracy and consistently enables effective end-to-end training of hybrid models incorporating various non-differentiable physical components (spatial light modulators, microring resonators, and Mach-Zehnder interferometers). This work bridges hardware-aware deep learning and gradient-free optimization, thereby offering a practical pathway for integrating non-differentiable physical components into scalable, end-to-end trainable AI systems.

artificial intelligence, machine learning, neural network, (17 more...)

arXiv.org Artificial Intelligence

2509.15113

Country: Europe > Russia (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Transportation > Air (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

e2065cb56f5533494522c46a72f1dfb0-Paper.pdf

Neural Information Processing SystemsAug-16-2025, 23:00:02 GMT

gradient propagation, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.05)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.69)

Add feedback

Review for NeurIPS paper: Interpolation Technique to Speed Up Gradients Propagation in Neural ODEs

Neural Information Processing SystemsMay-31-2025, 17:43:41 GMT

Weaknesses: I take issues with two aspects of this submission that lead me to recommend rejection at this point. The submission points out that the evaluations of z(t) at the Chebyshev grid points can be obtained without additional cost, e.g., at line 108. While this is true in some sense in general, there are many numerical theory aspects to this claim that are ignored here, both in the text as well as in the code. Runge-Kutta methods only guarantee high-order approximations at their own grid points. If high-order approximations are sought at pre-defined grid points, there are two solutions: a) the solvers are forced to include the pre-defined grid points as part of the otherwise adaptive mesh or b) a particular choice has to be made to find a smooth-interpolant Runge-Kutta formula.

grid point, interpolation technique, neural ode, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision > Image Understanding (0.45)

Add feedback

Review for NeurIPS paper: Interpolation Technique to Speed Up Gradients Propagation in Neural ODEs

Neural Information Processing SystemsMay-31-2025, 17:43:33 GMT

The four reviewers, all of whom are domain experts, agree that this is a good paper that delivers a delicate but useful methodological contribution to the growing area of NODEs. It should thus be accepted However, the reviewers have also raised several suggestions and requests for improvements. Please make sure to address them as much as possible to ensure this paper reaches its audience.

gradient propagation, interpolation technique, neurips paper, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision > Image Understanding (0.40)

Add feedback

Contextual Gradient Flow Modeling for Large Language Model Generalization in Multi-Scale Feature Spaces

Quillington, Daphne, Fairbrother, Kingsley, Tattershall, Xavier, Kabakum, Irin

arXiv.org Artificial IntelligenceFeb-6-2025

Optimization methodologies for training large-scale neural architectures often rely on uniform gradient propagation mechanisms that fail to align with hierarchical linguistic structures, limiting their capacity to generalize across diverse language distributions. A structured gradient refinement framework was introduced to incorporate multi-scale contextual adjustments, improving parameter adaptation through dynamic weighting strategies that enhanced representation coherence. Empirical evaluations demonstrated that structured propagation mechanisms contributed to reductions in gradient oscillations, resulting in more stable training dynamics and improved optimization efficiency. The comparative performance assessment indicated that models incorporating hierarchical propagation strategies exhibited greater robustness in long-range dependency retention and cross-domain adaptation. The hierarchical adjustment of weight updates provided an alternative to conventional backpropagation, reducing sensitivity to initialization conditions while improving overall convergence efficiency. The experimental results confirmed that structured gradient propagation influenced representation learning trajectories, aligning parameter updates with broader linguistic dependencies rather than isolated token-level relationships. Statistical evaluations indicated that structured optimization strategies mitigated overfitting while preserving adaptability across heterogeneous text distributions. The findings established that structured gradient propagation provided an empirically validated framework for refining hierarchical representation learning, supporting more effective integration of linguistic dependencies into optimization dynamics.

large language model, machine learning, mechanism, (20 more...)

arXiv.org Artificial Intelligence

2502.04548

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback